Google's MapReduce programming model - Revisited

نویسنده

  • Ralf Lämmel
چکیده

Google’s MapReduce programming model serves for processing and generating large data sets in a massively parallel manner (subject to a suitable implementation of the model). We deliver the first rigorous description of the model. To this end, we reverse-engineer the seminal MapReduce paper and we capture our observations, assumptions and recommendations as an executable specification. We also identify and resolve some obscurities of the informal presentation in the seminal MapReduce paper. We use typed functional programming (specifically Haskell) as a framework for design recovery and executable specification. Our development comprises three components: (i) the basic program skeleton that underlies MapReduce computations; (ii) the opportunities for parallelism in executing MapReduce computations; (iii) a higher-level approach to MapReduce-like computations, inspired by the aggregators in Google’s domain-specific language Sawzall. Our development does not formalize the more implementational aspects of an actual, distributed execution of MapReduce computations.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

PigSPARQL: Übersetzung von SPARQL nach Pig Latin

Dieser Beitrag untersucht die effiziente Auswertung von SPARQLAnfragen auf großen RDF-Datensätzen. Zum Einsatz kommt hierfür das Apache Hadoop Framework, eine bekannte Open-Source Implementierung von Google's MapReduce, das massiv parallelisierte Berechnungen auf einem verteilten System ermöglicht. Zur Auswertung von SPARQL-Anfragen mit Hadoop wird in diesem Beitrag PigSPARQL, eine Übersetzung ...

متن کامل

Cloud Computing Technology Algorithms Capabilities in Managing and Processing Big Data in Business Organizations: MapReduce, Hadoop, Parallel Programming

The objective of this study is to verify the importance of the capabilities of cloud computing services in managing and analyzing big data in business organizations because the rapid development in the use of information technology in general and network technology in particular, has led to the trend of many organizations to make their applications available for use via electronic platforms hos...

متن کامل

SOLVING FUZZY LINEAR PROGRAMMING PROBLEMS WITH LINEAR MEMBERSHIP FUNCTIONS-REVISITED

Recently, Gasimov and Yenilmez proposed an approach for solving two kinds of fuzzy linear programming (FLP) problems. Through the approach, each FLP problem is first defuzzified into an equivalent crisp problem which is non-linear and even non-convex. Then, the crisp problem is solved by the use of the modified subgradient method. In this paper we will have another look at the earlier defuzzifi...

متن کامل

HPCC Systems : Data Intensive Supercomputing Solutions

Executive Summary As a result of the continuing information explosion, many organizations are drowning in data and the resulting " data gap " or inability to process this information and use it effectively is increasing at an alarming rate. Data-intensive computing represents a new computing paradigm which can address the data gap using scalable parallel processing and allow government and comm...

متن کامل

Transparent System Call Based Performance Debugging for Cloud Computing

Problem Diagnosis and debugging in concurrent environments such as the cloud and popular distributed systems frameworks has been a traditionally hard problem. We explore an evaluation of a novel way of debugging distributed systems frameworks by using system calls. We focus on Google's MapReduce framework, which enables distributed, data-intensive, parallel applications by decomposing a massive...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Sci. Comput. Program.

دوره 70  شماره 

صفحات  -

تاریخ انتشار 2008